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Abstract 

In this paper we consider the problem of finding the densest subset subject to co-matroid con- 
straints. We are given a monotone supermodular set function / defined over a universe U, and the 
density of a subset S is defined to be f(S)/\S\. This generalizes the concept of graph density. Co- 
matroid constraints are the following: given matroid M. a set S is feasible, iff the complement of S 
is independent in the matroid. Under such constraints, the problem becomes NP-hard. The spe- 
cific case of graph density has been considered in literature under specific co-matroid constraints, 
for example, the cardinality matroid and the partition matroid. We show a 2-approximation for 
finding the densest subset subject to co-matroid constraints. Thus, for instance, we improve the 
approximation guarantees for the result for partition matroids in the literature. 

1 Introduction 

In this paper, we consider the problem of computing the densest subset with respect to a 
monotone supermodular function subject to co-matroid constraints. Given a universe U of 
n elements, a function / : 2 U — > M + is supermodular iff 

f(A) + f(B)^f(AuB) + f(AnB) 

for all A,BCU. If the sign of the inequality is reversed for all A, B, then we call the function 
submodular. The function / is said to be monotone if f(A) ^ f{B) whenever A C B; we 
assume /(0) = 0. We define a density function d : 2 U —> R + as d(S) = f(S)/\S\. Consider 
the problem of maximizing the density function d(S) given oracle access to the function /. 
We observe that the above problem can be solved in polynomial time (see Theorem [5]) . 

The main problem considered in this paper is to maximize d(S) subject to certain 
constraints that we call co-matroid constraints. In this scenario, we are given a matroid 
Ad = (U,I) where I C 2 U is the family of independent sets (we give the formal definition of 
a matroid in Section [S]). A set S is considered feasible iff the complement of S is independent 
i.e. S S T. The problem is to find the densest feasible subset S given oracle access to / and 
Ad. We denote this problem as DEN-M. 

We note that even special cases of the DEN-M problem are NP-hard [TJj • The main result 
in this paper is the following: 

► Theorem 1. Given a monotone supermodular function f over a universe U , and a matroid 
Ad defined over the same universe, there is a 2-approximation algorithm for the DEN-M 
problem. 
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Alternatively one could have considered the same problem under matroid constraints 
(instead of co- matroid constraints). We note that this problem is significantly harder, since 
the Densest Subgraph problem can be reduced to special cases of this problem (see [21 [T4"]). 
The Densest Subgraph problem is notoriously hard: the best factor approximation known 
to date is C^n 1 / 4 ^) for any e > [3]. 

Special cases of the DEN-M problem have been extensively studied in the context of graph 
density, and we discuss this next. 

1.1 Comparison to Graph Density 

Given an undirected graph G = (V,E), the density d(S) of a subgraph on vertex set S is 
defined as the quantity ^rt^p , where E(S) is the set of edges in the subgraph induced by the 
vertex set S. The densest subgraph problem is to find the subgraph S of G that maximizes 
the density. 

The concept of graph density is ubiquitous, more so in the context of social networks. 
In the context of social networks, the problem is to detect communities: collections of 
individuals who are relatively well connected as compared to other parts of the social network 
graph. 

The results relating to graph density have been fruitfully applied to finding communities 
in the social network graph (or even web graphs, gene annotation graphs [15) . problems 
related to the formation of most effective teams [9], etc.). Also, note that graph density 
appears naturally in the study of threshold phenomena in random graphs, see pQ. 

Motivated by applications in social networks, the graph density problem and its variants 
have been well studied. Goldberg [TT] proved that the densest subgraph problem can be 
solved optimally in polynomial time: he showed this via a reduction to a series of max-flow 
computations. Later, others 114) have given new proofs for the above result, motivated 
by considerations to extend the result to some generalizations and variants. 

Andersen and Chellapilla [2] studied the following generalization of the above problem. 
Here, the input also includes an integer k, and the goal is to find the densest subgraph 
S subject to the constraint \S\ k. This corresponds to finding sufficiently large dense 
subgraphs in social networks. This problem is NP-hard [H]. Andersen and Chellapilla [2] 
gave a 2-approximation algorithm. Khuller and Saha [14j give two alternative algorithms: 
one of them is a greedy procedure, while the other is LP-based. Both the algorithms have 
2-factor guarantees. 

Gajewar and Sarma [5] consider a further generalization. The input also includes a 
partition of the vertex set into U\, L^, • • • ,Ut, and non- negative integers ri, r%, ■ ■ ■ , r t . The 
goal is to find the densest subgraph S subject to the constraint that for all 1 ^ i ^ t, 
\S D Ui\ ^ r,. They gave a 3-approximation algorithm by extending the greedy procedure of 
Khuller and Saha [H]. 

We make the following observations: (i) The objective function i s monotone and 

supermodular. (ii) The constraint |5| ^ k (considered by [2]) is a co-matroid constraint; 
this corresponds to the cardinality matroid. (iii) The constraint considered by Gajewar and 
Sarma [9] is also a co-matroid constraint; this corresponds to the partition matroid (formal 
definitions are provided in Section [2J . Consequently, our main result Theorem [1] improves 
upon the above results in three directions: 

h Objective function: Our results apply to general monotone supermodular functions / 

instead of the specific set function \E(S)\ in graphs. 
m Constraints: We allow co-matroid constraints corresponding to arbitrary matroids. 
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h Approximation Factor: For the problem considered by Gajewar and Sarma [S], we im- 
prove the approximation guarantee from 3 to 2. We match the best factor known for the 
at-least-fc densest subgraph problem considered in [21 114j. 



Knapsack Covering Constraints: 

We also consider the following variant of the DEN-M problem. In this variant, we will 
have a weight Wi (for i = 1, • • • , \U\) for every element i G U, and a number k € N. A set S 
of elements is feasible if and only if the following condition holds: 



We call this a knapsack covering constraint. We extend the proof of Theorem [T] to show the 
following: 

► Theorem 2. Suppose we are given a monotone supermodular function f over a universe U, 
weights Wi for every element i G U, and a number k G N. Then there is a 3 -approximation 
algorithm for maximizing the density function d(S) subject to knapsack covering constraints 
corresponding to the weights Wi and the number k. 

Dependency Constraints: 

Saha et. al[T3] consider a variant of the graph density problem. In this version, we are 
given a specific collection of vertices A C V; a subset S of vertices is feasible iff A C S. 
We call this restriction the subset constraint. The objective is to find the densest subgraph 
among subsets satisfying a subset constraint. Saha et. al|15j prove that this problem is 
solvable in polynomial time by reducing this problem to a series of max-flow computations. 

We study a generalization of the subset constraint problem. Here, we are given a mono- 
tone supermodular function / defined over universe U. Additionally, we are given a directed 
graph D = (U,A) over the universe U. A feasible solution S has to satisfy the following 
property: if a G S, then every vertex of the digraph D reachable from a also has to be- 
long to S. Alternatively, a G S and (a, b) G A implies that b G S. We call the digraph 
D as the dependency graph and such constraints as dependency constraints. The goal is to 
find the densest subset S subject to the dependency constraints. We call this the DENdep 
problem. We note that the concept of dependency constraints generalizes that of the subset 
constraints: construct a digraph D by drawing directed arcs from every vertex in U to every 
vertex in A. The motivation for this problem comes from certain considerations in social 
networks, where we are to find the densest subgraph but with the restriction that in the 
solution subgraph all the members of a sub-community (say, a family) are present or absent 
simultaneously. In literature, such a solution S that satisfies the dependency constraints is 
also called a closure (see [TS], Section 3.7.2). Thus our problem can be rephrased as that of 
finding the densest subset over all closures. 

We note that dependency constraints are incomparable with co-matroid constraints. In 
fact dependency constraints are not even upward monotone: it is not true that if S is a 
feasible subset, any superset of S is feasible. 

Our result is as follows: 

► Theorem 3. The DENdep problem is solvable in polynomial time. 

The salient features of the above result are as follows: 



1.2 Other Results 
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m While the result in \R>\ is specific to graph density, our result holds for density functions 
arising from arbitrary monotone supcrmodular functions. 

Our proof of this result is LP-based. The work of [T5] is based on max-flow computations. 
We can extend our LP-based approach (via convex programs) to the case for density 
functions arising from arbitrary monotone supcrmodular /, while we are not aware as to 
how to extend the max-flow based computation. 
h The proof technique, inspired by Iwata and Nagano [13] also extends to show "small 
support" results: thus, for instance, we can show that for the LP considered by [H] for 
the at-least-k-densest subgraph problem, every non-zero component of any basic feasible 
solution is one of two values. 

Combination of Constraints: 

We also explore the problem of finding the densest subset subject to a combination of the 
constraints considered. We are able to prove results for the problem of maximizing a density 
function subject to (a) co-matroid constraints and (b) subset constraints. Suppose we are 
given a monotone supermodular function / over a universe U, a matroid A4 = (U,I), and 
a subset of elements A C U . A subset S is called feasible iff (1) S satisfies the co-matroid 
constraints wrt Ai (i.e. S G 1) and (2) S satisfies the subset constraint wrt A (i.e. AC. S). 
We show the following: 

► Theorem 4. There is a 2 -approximation algorithm for the problem of maximizing the 
density function d(S) corresponding to a monotone supermodular function f , subject to the 
co-matroid and subset constraints. 

1.3 Related Work 

Recently, there has been a considerable interest in the problems of optimizing submodular 
functions under various types of constraints. The most common constraints that are con- 
sidered arc matroid constraints, knapsack constraints or combinations of the two varieties. 
Thus for instance, Calinescu et. al [5] considered the problem of maximizing a monotone 
submodular function subject to a matroid constraint. They provide an algorithm and show 
that it yields a (1 — l/e)-approximation: this result is essentially optimal (also see the re- 
cent paper [SJ for a combinatorial algorithm for the same). Goemans and Soto [ID] consider 
the problem of minimizing a symmetric submodular function subject to arbitrary matroid 
constraints. They prove the surprising result that this problem can be solved in polyno- 
mial time. In fact, their result extends to the significantly more general case of hereditary 
constraints; the problem of extending our results to arbitrary hereditary functions is left 
open. 

The density functions that we consider may be considered as "close" to the notion of 
supermodular functions. To the best of our knowledge, the general question of maximizing 
density functions subject to a (co-)matroid constraint has never been considered before. 

1.4 Proof Techniques 

We employ a greedy algorithm to prove Theorems [T] and [H Khuller and Saha [2j and 
Gajewar and Sarma [5] had considered a natural greedy algorithm for the problem of max- 
imizing graph density subject to co-matroid constraints corresponding to the cardinality 
matroid and partition matroid respectively. Our greedy algorithm can be viewed as an ab- 
straction of the natural greedy algorithm to the generalized scenario of arbitrary monotone 
supermodular functions. However, our analysis is different from that in |141 15]: the efficacy 
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of our analysis is reflected in the fact that we improve on the guarantees provided by [9]. 
While they provide a 3-approximation algorithm for the graph density problem with par- 
tition matroid constraints, we use the modified analysis to obtain a 2-factor guarantee. In 
both of the earlier papers [21 [UJ , a particular stopping condition is employed to define a set 
Di useful in the analysis. For instance, in Section 4.1 of [PJ they define Dg using the optimal 
set H* directly. We choose a different stopping condition to define the set Df, it turns out 
that this choice is crucial for achieving a 2-factor guarantee. 

We prove Theorem [3] using LP-based techniques. In fact, we provide two proofs for the 
same. Both our techniques also provide alternate new proofs of the basic result that graph 
density is computable in polynomial time. The first proof method is inspired by Iwata and 
Nagano [T3j. The second proof method invokes Cramer's rule to derive the conclusion. 

1.5 Organization 

We present the relevant definitions in Section O We proceed to give the proof of Theorem [1] 
in Section^ while the proof of Theorem[5]is presented in Section|U The proof of Theorem[3] 
is presented in Section [5l and the proof of Theorem H] is in Section 

2 Preliminaries 

In this paper, we will use the following notation: given disjoint sets A and B we will use 
A+B to serve as shorthand for AuB. Vice versa, when we write A+B it will hold implicitly 
that the sets A and B are disjoint. 

Monotone: A set function / is called monotone if f(S) ^ f(T) whenever S CT. 
Supermodular: A set function / : 2 — > R + over a universe U is called supermodular if 
the following holds for any two sets A,B<ZU: 

f(A) + f(B)^f(AuB) + f(AriB) 

If the inequality holds (for every A, B) with the sign reversed, then the function / is called 
submodular. In this paper, we will use the following equivalent definition of supermodularity: 
given disjoint sets A, B and C, 

f(A + C) - f(A) ^ f(A + B + C)-f(A + B) 

We can think of this as follows: the marginal utility of the set of elements C to the set A 
increases as the set becomes "larger" (A + B instead of A). 

It is well known (see |121 117j ) that supermodular functions can be maximized in poly- 
nomial time (whereas submodular functions can be minimized in polynomial time). Let us 
record this as: 

► Theorem 5. Any supermodular function f : 2 U — > R + can be maximized in polynomial 
time. 

We also state the following folklore corollary: 

► Corollary 6. Given any supermodular function f : 2 U — » R + , we can find maxg in 
polynomial time. 

For completeness, a proof of this Corollary is included in Section I5T21 

Density Function: Given a function / over U, the density of a set S is defined to be 
d(S) = ifft. 

Matroid: A matroid is a pair M = [U,I) where X C 2 , and 
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fix) 

Hi argmax x ^ x ^' 
Di <- Hi 

while Di infcasiblc do 

H i+1 «- argmax x . Xnr ,. =0 /(£> ' + |^f /(Di) 
A+i «- A + Bi+i 
i «- i + 1 
end while 

L «- i 

for i = 1 — >■ L do 

Add arbitrary vertices to -Di to make it minimal feasible 

Call the result D\ 
end for 

Output the subset among the D^'s with the highest density 



Figure 1 Main Algorithm 



1. (Hereditary Property) VBeI,icB AeX. 

2. (Extension Property) MA, B e X : \ A\ < \B\ 3x E B \ A : A + x 6 X 

Matroids arc generalizations of vector spaces in linear algebra and are ubiquitous in combin- 
atorial optimization because of their connection with greedy algorithms. Typically the sets 
in X arc called independent sets, this being an abstraction of linear independence in linear 
algebra. The maximal independent sets in a matroid are called the bases (again preserving 
the terminology from linear algebra). An important fact for matroids is that all bases have 
equal cardinality - this is an outcome of the Extension Property of matroids. 

Any matroid is equipped with a rank function r : 2 — > IR + . The rank of a subset S 
is defined to be the size of the largest independent set contained in the subset 5". By the 
Extension Property, this is well-defined. See the excellent text by Schrijver [TB] for details. 

Two commonly encountered matroids are the (i) Cardinality Matroid: Given a universe U 
and r S N, the cardinality matroid is the matroid M = (U,X), where a set A is independent 
(i.e. belongs to X) iff \A\ ^ r. (ii) Partition Matroid: Given a universe U and a partition of 
U as U\, • ■ ■ , U r and non- negative integers r%, ■ • ■ ,r t , the partition matroid is Ai = (U,I), 
where a set A belongs to X iff \A (~1 Ui\ ^ Vi for all i = 1, 2, • • • ,t. 

Convex Programs: We will need the definition of a convex program, and that they can 
be solved to arbitrary precision in polynomial time, via the ellipsoid method(see |12j). We 
refer the reader to the excellent text [1]. 

3 Proof of Theorem [T] 

We first present the algorithm and then its analysis. To get started, we describe the intuition 
behind the algorithm. 

Note that co-matroid constraints are upward monotone: if a set S is feasible for such 
constraints, then any superset of S is also feasible. Thus, it makes sense to find a maximal 
subset of U with the maximum density. In the following description of the algorithm, one 
may note that the sets D±, £>2, ■ • ■ ,D% are an attempt to find the maximal subset with the 
largest density. Given this rough outline, the algorithm is presented in Figure [TJ 

We note that we can find the maximum maxx : xnfl,=8 /(-P.+a^)-/(-D,) ^ polynomial time. 



V. Chakaravarthy et. al 



This is because the function /(A + X) for a fixed A is supcrmodular (and we appeal to 
Corollary [H]). 

Let H* denote the optimal solution, i.e. the subset that maximizes the density d(S) 
subject to the co-matroid constraints. Let d* denote the optimal density, so that f(H*) = 
d*-\H*\. 

We can make the following easy claim: 

► Claim 1. The subset A obeys the inequality d(D\) ^ d* . 

This is because A is the densest subset in the universe U, while d* is the density of a 
specific subset H*. 

In the following, we will have occasion to apply the following lemmas. 

► Lemma 7. Let a,b,c,d,9 £ R + be such that the inequalities ^ 9 and ^ 9 hold. Then 
it is true that $%>0. Thus, then > § (by setting 6 = 

Also, 

► Lemma 8. Let a, 6, c, d G R + be real numbers such that § ^ § holds. 
=n Suppose a c, b ^ d. Then the inequality %E3 ^ f holds. 

h Suppose c ^ a, d ^ b. Then the inequality % holds. 

We make the following claim: 

► Claim 2. The sequence of subsets A, A, • • ■ , -A obeys the following ordering: 

/(gO /(a) -/(A) > /(A+Q - Ago /(A) - /(A-i) 

|A| " |A|-|A| " " IA+iI-IAI " " |A|-|A-i| 

Proof. Consider any term in this sequence, say |§ ± 7]3n>p ■ Note that A+i was chosen 
as arg max of ^ ( -" P ' + ^p^ ( -" P '' > . Therefore, maxj /(-P'+jOp/t- ') = ^ p'^-jj 1 ]'^ ■ Hence this 

quantity is larger than ^m^Er^T^ ( as l° n § as A+2 is we U defined). Now from the second 
part of Lemma [§1 we get that 

/(A+Q-/(A) /(A+ 2 ) - /(gO /(A+ 2 ) - /(A+Q 
|A+i|-|A| " IA+2MAI " IA+2I-IA+1I 

Via an application of Lemma [71 we then have: 

► Claim 3. Given any i (1 ^ i ^ i), the following holds: 

/(A) > /(A) - /(A-Q 
|A| " |A|-|A-i| 

Proof. We will the prove the statement by induction. 

Base Case: We implicitly assume that A = 0, and hence the case for i = 1 holds. 
Induction Step: Assume the statement by induction for i = k, and we prove it for i = k+1. 
Thus, by hypothesis we have 

/(AO > /(A) - /(A-i) 
|A| " lAMA-il 

Now by Claim [2] we have that 

/(AO - /(A-i) /(A+i) - /(A) 
|A|-|A-i| " |A+i|-|A| 



8 Density Functions 



Thus, 

f(Dk) > f(D k+ i)-f(D k ) 
\D k \ ' \D k+l \-\D k \ 

Applying Lemma EJ we get: 

f(D k+ i) > f(D k+1 )-f(D k ) 
\D k+1 \ " \D k+1 \-\D k \ 

Thus we have proven the Claim by induction. -4 

The analysis will be broken up into two parts. We will consider the set Di in the sequence 
D\,D 2 ,--- ,Dl such that the following hold: 

f(D t ) - /(£>,_!) d* 
|D*|-|D*_i| " 2 

but 

f(D t+1 ) - f(D e ) d?_ 
\D t+ x\-\D t \ 2 

Since d(D\) > d* by Claim [TJ such an I will exist or I = L. If £ = L, then we have a feasible 
solution Dl with the property that A^Ti^m^Yp ^ if- Therefore, by Claim [3] we have 
that d(Di) ^- and we are done in this case. 

So we may assume that £ < L so that is no£ feasible. In this case, we will prove that 
D' e has the correct density, i.e. that d(D' e ) ^ 

To this end, we will prove two facts about Dg and that will yield the desired result: 

► Claim 4. 

f(D e ) - f(D e n H*) > y(|^| - \D t n ff*|) 

Proof. Note that D e = Hi + H 2 H h For brevity for 1 < i < £, denote iJ 4 n H* as 

A, (thus, Ai C flj for every i). Thus, D^> n i?* = A x + A 2 H \- A e . 

We will prove the following statement by induction on i (for 1 ^ i ^ £): 

f(H 1 +H 2 +---+H i )-f(A 1 +A 2 +---+A i ) > — (\H 1 +H 2 +---+H i \-\A 1 +A 2 +---+A i \) 
Base Case: For i = 1, we have to prove that: 



fjHJ - f(Ax) d* 
\Hi\-\Ai\ ' 2 

Since H\ is the densest subset, we have 

/(ffx) /(Ax) 
l^i I " l^i I 



and we may apply (the first part of) Lemma [5] to obtain the desired. 

Induction Step: Assume the statement to be true for i, and we will prove it for i + 1. 
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Consider the following chain: 



f(Hi + ■ 


-+Hi- 


f-tfi+i) 


- f{Hi + ■ 


■■ + Hi) 






\H i+ i\ 






f(Hi + ■ 


■■ + Hi- 


\-Ai+i) 


- f(Hi + ■ 


■■ + Hi) 






\A i+ i\ 






f{M + ■ 


■ ■ + A H 


-A i+ i) - 


- /(A, + ■ ■ 


• + Ai) 



\A 



+i 



We would now like to apply Lemma [S] to the first and last terms in the above chain. To 
this end, let us check the preconditions: 

monotone 

/(#! + -.. + H t + H l+1 ) - /(^ + • • • + Hi) ^ /(#! + ... + Hi + A i+1 ) - f{H x + ■ ■ ■ + H t ) 

super modular 

^ f(A 1 + --- + A i + A i+1 )-f(A 1 + --- + A i ) 
Also, clearly, \H i+1 \ ^ \A i+1 \. 

Thus, the preconditions in Lemma [S] hold, and we have that 

/(ffi + ■ ■ ■ + H i+1 ) - f{A 1 + ■■■ + A l+1 ) - /(ifr + ... + Hj) + f{A 1 + --- + Aj) > 

\Hi+l \ — l-^i+ll 

/(-Hi + --- + Hj + gj+i) - fjHi + ■ ■ ■ + Hj) > d*_ 
\H i+1 \ " 2 

Applying Lemma [7] to the first term in the above chain and the induction statement for 
i, we obtain the desired result for i + 1. Hence done. -4 

The next claim lower bounds the value f(Di n H*). 

Building up to the Claim, let us note that Dg n H* ^ 0. If the intersection were empty, 
then H* is a subgraph of density d* , and so Hg + \ would be a subgraph of density at least 
d*. But then, 

f(D e + H e+1 ) - f(Dj) superrnodular f(H i+1 ) > ^ 

\He+i\ I 
But this contradicts the choice of Di. 
► Claim 5. 

f(D e nH*) ^ ^\D t nH*\ + ^\H*\ 



Proof. Let X = H*-DtC\H*. Then, XnDg = and D e +X = DiLiH*. Then by definition 

X)-f(D e ) < f(De+i)-f(De 
JX] ** \D e+1 \-\D t \ 



of Di, we know that ilEi±X±zttEll <^ fi ° D t+1 \Z \ ( D D { ] < cP/2. Thus, f(D £ U H*) - f(D e ) < 



^(\H*\-\D e nH*\). 

Therefore, f(D t U H*) + f(D t n H*) sC f(D t ) + f{D t n H*) + f - \D e n H*\). 

Applying supermodularity we have that f(D e U H*) + f(D t n H*) ^ f(D t ) + f{H*). 
Thus, cancelling f(D e ) gives us that f(D t nH*) + £-(\H*\ - \D e DH*\) ^ f(H*). The claim 
follows by observing that d* = Tgrr ■ -4 

Note that this claim also implies that the density of the set Di n H* is at least d*. 
Intuitively, Di n H* is a subset that has "enough /-value" as well as a "good" density. 
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We may now combine the statements of Claim 2] and Claim to get the following chain 
of inequalities: 

Claim HI fl* d* Claim \5\ d* d* 

f(D e ) > f( De nH*) + -\D l \--\D t nH*\ > -\D t \ + -\H*\ 

Consider D'f this is obtained from by adding suitably many elements to make Dg 
feasible. Let r be the minimum number of elements to be added to Dg, so as to make it 
feasible. Since H* is a feasible solution too, clearly r ^ \H*\. With this motivation, we 
define the Extension Problem for a matroid M.. The input is a matroid M. = {U,I) and a 
subset A C U. The goal is to find a subset T of minimum cardinality such that A U T 6 X. 
Lemma IHl shows that we can find such a subset T in polynomial time. Thus, we would have 
that: 

d(D >) iMl > iM > /egg) > dV2 

\D ( \+r> \D e \+r> \D e \ + \H*\> d /2 

and we are done with the proof of Theorem [TJ modulo the proof of Lemma 
We proceed to present the lemma and its proof: 

► Lemma 9. The Extension Problem for matroid M. and subset A can be solved in polyno- 
mial time. 

Proof. The proof considers the base polyhedron of the matroid (see the text by Schrijver [IB]). 
We will have a variable Xi for each element i G U \ A, where Xi = 1 would indicate that we 
pick the element i in our solution T. For brevity, we will also maintain a variable yi that 
indicates whether i is absent from the solution T. Thus for every i, we will maintain that 
x i + Vi — 1- Given an arbitrary set S, we will let r(S) denote the rank of the subset S in 
the matroid M.. 

The following is a valid integer program for the Extension Problem (where y(S) is short- 
hand for X^ies Vi)- The linear program to the right is the relaxation of the integer program, 
and with variables x; eliminated. 



±±n±i / Xi ^ — ^ 

s.t. Xi + yi = 1 for all i € U ieU 
TP . t tj s.t. y(S) ^ r(S) for all S C U 

ir l ■ ,, Q\ <T r.l?\ W ell C C TT . - 



y(S) r(5) for all 5 C [/ 
Xi = 1 for all i £ A 

Xi,yi e {0, 1} for all i G [/ 



= for all i € A 

yi ^ for all i G £7 



The linear program LPi can also be formulated as a maximization question. To be 
precise, let VAL(LPi) denote the value of the program LPi. Then VAL(LPi) = \U\ — 
VAL(LP2), where LP2 is as follows: 



max ^ Vi 



l£U 

L p 9 . s.t. y(S) ^ r(S) for all S C U 

yi = for all i G A 

yi ^ for all i G U . 

Now, by folklore results in matroid theory (cf. [H>]), we have that solutions to LP2 are 
integral and can be found by a greedy algorithm. Thus, we can solve IPi in polynomial 
time, and this proves the statement of the Lemma. A 
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i <- 1 

f (X) 

Hi ^— arg max x ^ x ^' 
Dt <- B l 

while Di infcasiblc do 

H i+1 <- argmax A - :XnD!=0 
A+i «- A + H i+1 

i *r- i + 1 

end while 

L <- i 

for i = 1 — >• L do 

Order the vertices i in U \ Di by non-increasing order of weights Wi 
Add vertices from U \ Di in this order to Di until feasibility is attained 
Let the result be D[ 

end for 

Output the subset among the D'^s with the highest density 



Figure 2 Algorithm for Knapsack constraints 



4 Proof of Theorem [5] 



In order to prove this result, we will have to modify the algorithm presented in Section [3J 
In the analysis, we will correspondingly modify the definition of the set Dg. Then we will 
apply (modified versions of) Claim Q] and Claim [5] to derive the result. 
The modified algorithm is as shown in Figure [21 

Consider the set D( in the sequence D\, D2, ■ ■ ■ ,Dl such that the following hold: 

/(£>,) - /(A-i) > d*_ 
|A|-|A"-i| " 3 

but 

/(A+i) - fjPt) d* 
\D e+1 \-\D e \ 3 

As earlier, if there is no such I < L for which this holds, this implies that Dl satisfies 
n^T^fc^Tr d* /3. But this gives a 3-approximation in this case since Dl is feasible 
and 

[{Dl) - f(D L -i) 

d{DL) > KM 

Let us consider the other case where t < L and Dg is infeasible. Let H* denote the 
optimal solution. 

We state modified versions of Claim Q] and Claim [5J 

► Claim 6. (modified ClaimU]) 

f(D e ) - f(p t n h*) > y (\D t \ - \D t n 

► Claim 7. (modified Claim 0) 

/(A n iP) >^|An in + 
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These modified claims may be proven analogously to the original claims, taking into 
account the new definition for Dg. 

Now note that given a set Di, in order to make the set feasible for the knapsack cover 
constraint, wc pick the elements with the largest weights Wi so that feasibility is attained. 
The usual knapsack greedy algorithm shows that this is a 2-approximation to the optimal 
knapsack cover. Thus, if we add r elements, then r < 2H*. Thus we have that, 

f(p' t ) f(Dt) f_m d* 

D' e ' Di + r " D e + 2H* ^ 3 
Thus we have proven a 3-approximation. 



5 Proof of Theorem [3] 

We will present the proof for the case of the graph density function, i.e. where f(S) = \E(S) | . 
The proof for arbitrary / will require a passage to the Lovasz Extension Cf(x) of a set 
function f(S). In fact we will present two proofs of this fact for the special case of the graph 
density function. To the best of our knowledge, both the proofs are new, and seems simpler 
than existing proofs. For both the proofs, we will use the same LP. 
First Proof: 

We will augment the LP that Charikar [B] uses to prove that graph density is computable 
in polynomial time. Given a graph G = (V,E), there are edge variables y e and vertex 
variables x% in the LP. We are also given an auxiliary dependency digraph D = (V, A) on 
the vertex set V. In the augmented LP, we also have constraints Xi $J xj if there is an 
arc from i to j in the digraph D = (V,A). The DENdep problem is modelled by the linear 
program LP3. 



max ^2 y e 

eeE 

S.t. 



5> = i 



LP 3 : CPi : S - t- 

y e ^ Xi for all e ~ i, e G E 

Xi ^ Xj for all G A 

Xi > for all i G V(G) . 



max ^2 min 

e={i,j)£E 

5> = i 



Xi ^ Xj for all G A 

Xi ^ for all i G V(G) 



Suppose we are given an optimal solution H* to the DENdep problem. Let VAL(LP3) 
denote the feasible value of this LP: we will prove that VAL(LP3) = d(H*). 
VAL(LP 3 ) ^ d(H*): 

We let \H*\ = I, and Xi = 1/1 for i G H* , and otherwise. Likewise, we set y e = l/l for 
e G E(H*), and otherwise. Note that H* is feasible, so if a G H* and (a, b) G A, then it 
also holds that b G H*. We may check that the assignment x and y is feasible for the LP. 
So, d{H*) = " is achieved as the value of a feasible assignment to the LP. 
VAL(LP 3 ) s$ d{H*): 

In the rest of the proof, we will prove that there exists a subgraph H such that VAL ^ 
d(H). First, it is easy to observe that in any optimal solution of the above LP, the variables 
y e will take the values min{cc,;, Xj} where e = (i, j). Thus, we may eliminate the variables y e 
from the program LP3 to obtain the program CPi. We claim that CPi is a convex program. 
Given two concave functions, the min operator preserves concavity. Thus, the objective 
function of the above modified program is concave. Hence we have a convex program: here, 
the objective to be maximized is concave, subject to linear constraints. We may solve the 
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program CPi and get an output optimal solution x* . Relabel the vertices of V such that the 
following holds: x\ x% ^ ■ • • ^ x* . If there are two vertices with (modified) indices a and b 
where a < b and there is an arc (a, b) £ A, then we have the equalities x* a = x* +1 = ■ ■ • = x%. 
We will replace the inequalities in the program CPi as follows: 

max Xj 

e=(iJ)£E:i<j 

LP 4 : S - t ' Yl Xi = 1 

i 

Xi ^ Xi+i for all i 6 {1, 2, • • ■ , (n — 1)} 

x n ^ 

where some of the inequalities Xi ^ Xi+i may be equalities if there is an index a with a ^ i 
and an index b with b ^ (i+ 1) such that (a, 6) £ A. Note also that because of the ordering of 
the variables of this LP, the objective function also simplifies and becomes a linear function. 
Clearly x* is a feasible solution to this LP. Thus the value of this LP is no less than the 
value of CPi. Consider a BFS x to LP4. The program LP4 has (n + 1) constraints, and 
n variables. Given the BFS x, call a constraint non-tight if it does not hold with equality 
under the solution x. Thus, there may be at most one non-tight constraint in LP4. In other 
words, there is at most one constraint Xi ^ a;i+i that is a strict inequality. This, in turn, 
implies that all the non-zero values in x are equal. Let there be I such non-zero values. 
From the equality J^i Xi = 1, we get that each non-zero Xi = l/£. Let H denote the set of 
indices i which have non-zero Xi values. Then the objective value corresponding to this BFS 
x is \E(H)\/e = d(H). 

Thus we have proven that d(H) ^ VAL(LP 4 ) ^ VAL(CPi) = VAL(LP 3 ), as required. 
This completes the proof of Theorem [3] 

•4 

Remarks about the proof: 

m We remark that the objective in the convex program CPi is precisely the Lovasz Ex- 
tension Cf(x) for the specific function / = \E(S)\. Thus our proof shows that the LP 
provided by Charikar [B] is precisely the Lovasz Extension for the specific supermodular 
function \E(S)\. 

h Note that there are other proofs possible for this result. For instance, one can follow the 
basic argument of Charikar to show that LP3 satisfies d(H*) = VAL(LP3). The proof 
we provide above is new, and is inspired by the work of Iwata and Nagano 

h Via our proof, we also prove that any BFS for the basic graph density LP has the property 
that all the non-zero values are equal. This fact is not new: it was proven by Khuller 
and Saha [Tl] but we believe our proof of this fact is more transparent. 

Second Proof: 

Again, let us consider the program LP3. 

Similar to the above, it will suffice to prove that the BFS solutions to this LP have the 
property that all non-zero components are equal. 

Consider the constraint matrix B that consists of the LHS of the non-trivial constraints in 
the above LP, without the constraint J^i Xi = l. Thus B consists of rows for the constraints 
He ^ Xi (for e££:e~i) and the constraints Xi ^ Xj for (i,j) £ A. The matrix B is TUM: 
this is because it can easily be realised as the incidence matrix of a digraph. 

Thus the original constraint matrix consists of the matrix B augmented by a single (non- 
trivial) constraint, consisting of the sum of the x^s being equal to 1; and also the (trivial) 
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nonncgativity constraints a;, > and y e > 0. Let B' denote this augmented matrix. Note 
that B' need not be TUM. 

Consider a basic feasible solution (BFS) v = (yi, ■ ■ ■ , y e , ■ ■ ■ , y m , x%, ■ ■ ■ , Xi, ■ ■ ■ , x n ). 
Since v consists of (to + n) variables, there are (to + n) constraints in the constraint matrix 
B' that are tight. Consider the submatrix T formed by the tight constraints in the matrix 
B' . Since the constraint X^ 3 ^ = 1 is always tight, this will be included as a row in the 
matrix T. Without loss of generality, let this row be the last row r of T. Thus, v is the 
unique solution to the linear system Tv — b, where b T = (0, 0, • • ■ , 0, 1). 

Note that, by previous considerations, the submatrix T" of the matrix T consisting of all 
the rows of T but the last one, is TUM (since T' is then a submatrix of the matrix B). The 
s th component (for 1 < s < (m + n)) of v may be found by Cramer's rule as v s = det^r) ' 
where T s is the matrix T with the s th column replaced by the vector b. 

Note that det(T) is at most |V(G)| = n. This is because the row r has at exactly n l's, 
so we may expand the determinant by row r. Any sub-determinant to be computed in this 
row-wise expansion of the determinant is a submatrix of T, thus is TUM. Therefore, det(T) 
is a sum of at most n +l's and —l's thus, is (say) k where k < n. 

Consider the computation of det(T s ). The matrix T s has its s th column replaced by the 
vector b, which has precisely one 1. So we may expand the determinant of T s by its s th 
column, and thereby, the determinant is that of a square submatrix of the matrix T. This 
means that det(T s ) is 0, 1 or — 1. 

Thus, every component of v is precisely or -r, or —r. However every component in the 
LP is > 0, thus the third possibility is excluded. This completes the proof. M 

5.1 Arbitrary monotone supermodular functions: 

We now proceed to consider the case where we are given an arbitrary monotone supermodular 
function / over the universe U and a directed graph D = (U, A), where the arcs in A specify 
the dependencies. 

To extend our results to this, we will need the concept of the Lovasz Extension. 
The Lovasz Extension Cf : [0, l] u — > R, first defined by Lovasz, is an extension of an 
arbitrary set function / : 2 U — >• M. We proceed with the formal definition: 

► Definition 10. (Lovasz Extension) Fix x £ [0, 1} U , and let U = {vi, V2, ■ • ■ , v n } be such 
that x(vi) > x(v-2) > • • • > x(v n ). For < i < n, let Si = {v\, v%, • ■ ■ , Vi}. Let {Ai}™ =0 be 
the unique coefficients with A.; > 0, and J2i Aj = 1 such that: 

n 

x = A t i Si 

i=0 

It is easy to see that A„ = x(v n ), and for < i < n, we have Xi = x(vi) — x(i'i + i) and 
Ao = 1 — x{vi). The value of the Lovasz Extension of / at x is defined as 

£/(*)= 

i 

For motivation behind the definition, refer to the excellent survey on submodular func- 
tions by Dughmi [7]. 

The Lovasz Extension enjoys the following properties: 
n Cf is concave iff / is supermodular. 

If / is supermodular, the maximum value of f(S) is the same as the maximum value of 

C f (x). 
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•m Restricted to the subspace x\ > X2 > ■ ■ • > x n , the function Cf is linear. 

We are now ready to describe our convex program CP for computing the densest subset 
of the universe U subject to dependency constraints. For details on convex programming, 
one may consult the text @]. 

The program has variables Xi,x%, ■ ■ ■ ,x n corresponding to the elements i E U. Since 
f(S) is supermodular, the corresponding Lovasz Extension Ce{x) is concave. 



max C(x) 

Qp . s.t. (x, 1) = 1 

Xi ^ Xj for all G A 
x, ^ for all i e V(G) . 

This convex programming problem can be solved to arbitrary precision in polynomial 
time by the ellipsoid method (see [H]). 

As in the first proof above, we will relabel the elements of the universe so that x\ ^ x* 2 ^ 
• • • ^ x n . But now, by the property of the Lovasz Extension, we see that C(x) is a linear 
function in this subspace. 

Now, the rest of the first proof carries over and gives us the result for arbitrary monotone 
supermodular /. 

5.2 Proof of Corollary [6] 

There are many ways to see this. One way is to consider the convex program above for the 
Lovasz Extension of the monotone supermodular function /. 

max C{x) 

(x, i) = i 

x l ^ Xi+i for all i e {1, 2, • ■ • , (n - 1)} 
x n > 

As in the proof above, we can see that this has solutions x* where all the nonzero x^s are 
equal, and that this corresponds to choosing a subset S so that C(x*) = f(S)/\S\. Thus, 
we see that maxg f(S)/\S\ is computable in polynomial time. 

Yet another way of verifying Corollary [5] is to consider the sequence of functions g(a, S) = 
f(S) — a\S\ (for fixed a ^ 0). Note that each g(a, S) is supermodular for any fixed a, and 
so can be maximized in polynomial time. Also observe that if maxg f(S)/\S\ ^ a for some 
a, then maxg g(a, S) ^ 0. Conversely, if max^ f(S)/\S\ ^ a then maxsg(a, S) ^ 0. Thus, 
we can find maxs f(S)/\S\ by a binary search over a and maximizing the corresponding 
functions g{&, S). 

6 Proof of Theorem |4] 

To fix the notation, in this problem, we are given a monotone supermodular function / over 
a universe U, a matroid A4 = (U,T), and a set A C U. 



CP 2 : s - 
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The only modification that we have to make to Algorithm[T]is that we will choose the first 
set Hi such that Hi maximizes the density of all subsets that contain the set A. Note that we 
can do this in polynomial time. Apart from this, the construction of the sets H2, H3, ■ ■ ■ , Hl 
and the sets Dx, D2, ■ • ■ , F>l are the same as in Algorithm Q] So, each Di contains A, and 
the candidate feasible solutions D' i also contain A. The analysis of the modified algorithm 
is the same as in Section [31 Thus we obtain a 2-approximation algorithm as promised in 
Theorem 2J 

7 Open Problems 

One interesting open direction is to investigate the maximization of density functions subject 
to combinations of constraints. In this paper, we consider the combination of a single matroid 
and a subset constraint. In general, one could ask similar questions about combinations of 
multiple matroid constraints or a matroid and a dependency constraint for instance. Another 
open question is to derive a LP-bascd technique to prove the result in Theorem [1] 
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